127 research outputs found
Joint segmentation of many aCGH profiles using fast group LARS
Array-Based Comparative Genomic Hybridization (aCGH) is a method used to
search for genomic regions with copy numbers variations. For a given aCGH
profile, one challenge is to accurately segment it into regions of constant
copy number. Subjects sharing the same disease status, for example a type of
cancer, often have aCGH profiles with similar copy number variations, due to
duplications and deletions relevant to that particular disease. We introduce a
constrained optimization algorithm that jointly segments aCGH profiles of many
subjects. It simultaneously penalizes the amount of freedom the set of profiles
have to jump from one level of constant copy number to another, at genomic
locations known as breakpoints. We show that breakpoints shared by many
different profiles tend to be found first by the algorithm, even in the
presence of significant amounts of noise. The algorithm can be formulated as a
group LARS problem. We propose an extremely fast way to find the solution path,
i.e., a sequence of shared breakpoints in order of importance. For no extra
cost the algorithm smoothes all of the aCGH profiles into piecewise-constant
regions of equal copy number, giving low-dimensional versions of the original
data. These can be shown for all profiles on a single graph, allowing for
intuitive visual interpretation. Simulations and an implementation of the
algorithm on bladder cancer aCGH profiles are provided
The group fused Lasso for multiple change-point detection
We present the group fused Lasso for detection of multiple change-points
shared by a set of co-occurring one-dimensional signals. Change-points are
detected by approximating the original signals with a constraint on the
multidimensional total variation, leading to piecewise-constant approximations.
Fast algorithms are proposed to solve the resulting optimization problems,
either exactly or approximately. Conditions are given for consistency of both
algorithms as the number of signals increases, and empirical evidence is
provided to support the results on simulated and array comparative genomic
hybridization data
Maturity Mismatch and Financial Crises: Evidence from Emerging Market Corporations
Substantial attention has been paid in recent years to the risk of maturity mismatch in emerging markets. Although this risk is microeconomic in nature, the evidence advanced thus far has taken the form of macro correlations. This paper empirically evaluates this mechanism at the micro level by using a database of over 3,000 publicly traded firms from fifteen emerging markets. The paper measures the risk of short-term exposure by estimating, at the firm level, the effect on investment of the interaction of short-term exposure and aggregate capital flows. This effect is (statistically) zero, contrary to the prediction of the maturity-mismatch hypothesis. This conclusion is robust to using a variety of different estimators, alternative measures of capital flows, and controls for devaluation effects and access to international capital. The paper finds evidence that short-term-exposed firms pay higher financing costs and liquidate assets at fire sale prices, but the paper does not find that this reduction in net worth translates into a drop in investment.
Corporate Dollar Debt and Depreciations: Much Ado About Nothing?
Much has been written recently about the problems for emerging markets that might result from a mismatch between foreign-currency denominated liabilities and assets (or income flows) denominated in local currency. In particular, several models, developed in the aftermath of financial crises of the late 1990s, suggest that the expansion in the "peso" value of "dollar" liabilities resulting from a devaluation could, via a net worth effect, offset the expansionary competitiveness effect. Assessing which effect dominates is ultimately an empirical matter. In this vein, this paper constructs a new database with accounting information (including the currency composition of liabilities) for over 450 non-financial firms in five Latin American countries. The authors estimate, at the firm level, the reduced-form effect on investment of holding foreign-currency-denominated debt during an exchange-rate realignment. It is consistently found that, contrary to the predicted sign of the net-worth effect, firms holding more dollar debt do not invest less than their counterparts in the aftermath of a depreciation. The paper shows that this result is due to firms matching the currency denomination of their liabilities with the exchange-rate sensitivity of their profits. Because of this matching, the negative balance-sheet effects of a depreciation on firms holding dollar debt are offset by the larger competitiveness gains of these firms.
Long signal change-point detection
The detection of change-points in a spatially or time ordered data sequence
is an important problem in many fields such as genetics and finance. We derive
the asymptotic distribution of a statistic recently suggested for detecting
change-points. Simulation of its estimated limit distribution leads to a new
and computationally efficient change-point detection algorithm, which can be
used on very long signals. We assess the algorithm via simulations and on
previously benchmarked real-world data sets
The Statistical Performance of Collaborative Inference
The statistical analysis of massive and complex data sets will require the
development of algorithms that depend on distributed computing and
collaborative inference. Inspired by this, we propose a collaborative framework
that aims to estimate the unknown mean of a random variable . In
the model we present, a certain number of calculation units, distributed across
a communication network represented by a graph, participate in the estimation
of by sequentially receiving independent data from while
exchanging messages via a stochastic matrix defined over the graph. We give
precise conditions on the matrix under which the statistical precision of
the individual units is comparable to that of a (gold standard) virtual
centralized estimate, even though each unit does not have access to all of the
data. We show in particular the fundamental role played by both the non-trivial
eigenvalues of and the Ramanujan class of expander graphs, which provide
remarkable performance for moderate algorithmic cost
Progress and open challenges in extremely high-dimensional medical outcome prediction
National audienceUsing biological data for medical decisions requires ”extremely high” prediction accuracy ; mistakes can lead to death
Between-Subject and Within-Subject Model Mixtures for Classifying HIV Treatment Response
We present a method for using longitudinal data to classify individuals into clinically-relevant population subgroups. This is achieved by treating ``subgroup'' as a categorical covariate whose value is unknown for each individual, and predicting its value using mixtures of models that represent ``typical'' longitudinal data from each subgroup. Under a nonlinear mixed effects model framework, two types of model mixtures are presented, both of which have their advantages. Following illustrative simulations, longitudinal viral load data for HIV-positive patients is used to predict whether they are responding -- completely, partially or not at all -- to a new drug treatment
Automatic data binning for improved visual diagnosis of pharmacometric models
International audienceVisual Predictive Checks (VPC) are graphical tools to help decide whether a given model could have plausibly generated a given set of real data. Typically, time-course data is binned into time intervals, then statistics are calculated on the real data and data simulated from the model, and represented graphically for each interval. Poor selection of bins can easily lead to incorrect model diagnosis. We propose an automatic binning strategy that improves reliability of model diagnosis using VPC. It is implemented in version 4 of the Monolix software
- …